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1. INTRODUCTION 

Land use and land cover classification (LULCC) is a critical technique for assessing global change 
at various spatiotemporal scales [1]. It is a pervasive, accelerating, and substantial process fueled by human 
activity and frequently results in changes that directly affect humans. The effects of LULCC on ecosystem 
sustainability are becoming a growing focus of global change study [2]. Till today, there has been a 
requirement to deliver provincial land use and land cover (LULC) maps and information for a variety of 
purposes, including change detection [3], planning or monitoring of the urban environment [4], disaster 
monitoring, landscape planning, resource management, site suitability analysis and ecological studies [5] or 
biological investigation [6]. Traditionally, non-parametric machine-learning classifiers (ML) such as random 
forests (RF) and support vector machines (SVMs) [7] have been used for geographical and easy-to-use 
classification. 

The focus of this work is to identify the physical aspect of the earth's surface (land cover) as well as 
how we exploit the land (land use) for the twin cities of Odisha. This can be accomplished by field surveys or 
through the analysis of satellite pictures (remote sensing) [6]. Conducting field surveys is more thorough and 
authoritative. It is a costly endeavor that frequently takes a long time to complete. But with recent 
advancements in the space sector and an increase in the availability of satellite photos (both free and 
commercial), machine learning models [8] have demonstrated promising outcomes in this field. Recent 
advancements in sensor technology have resulted in the development of a constellation of satellites [9] and 
airborne platforms from which a significant amount of spatial resolution remotely sensed imagery is 
available. Landsat-8 [10] is now circling the earth. The operational land imager sensor (OLI) offers images in 
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six distinct spectral bands on the Landsat payload. In this paper, the data of Landsat-8 is used for 
classification. The contributions of the work are i) land use and land cover classification using machine 
learning models; ii) generating the feature set from the raster using the shapefile of the training and test data; 
iii) designing an ensemble model by combining the output of the XGBoost model with SVM; 
and iv) performing efficacy analysis of ensemble models in view of user, producer, and overall accuracy. 


2. LITERATURE SURVEY 

Many works have been done to examine the use of LULC analysis on remotely sensed records. 
From 1986 to 2001 in Pallisa District, Uganda, Otukei and Blaschke [3] carried out land cover mapping and 
land cover assessing using DTs, SVMs and MLCs. They explored the use of knowledge mining to find the 
required classification bands and thresholds for decision. The analysis assessed the efficiency of the 
classification models, claiming that land cover elements occur at an unpredictable pace. 

According to desired classes, a few image classification models are available for segmenting a 
multi-dimensional component space into homogenous regions and labelling segments. Parametric classifiers 
accept a normally distributed dataset and statistical parameters acquired properly from training data. The 
most broadly utilized parametric classifier is the maximum-likelihood classifier (MLC), which makes 
decision surfaces dependent on the mean and covariance of each class. MLC [11] was first applied to IRS 
LISS-II images between 2001 and 2011 and classified into eight classes. Additionally, the study used a 
unique methodological framework for post-classification adjustments. It considerably increased total 
classification accuracy from 67.84% to 82.75% in 2001 and from 71.93% to 87.43% in 2011. 

Islam et al. [1] used Landsat TM and Landsat 8 OLI/TIRS images to examine land use changes in 
Chunati Wildlife Sanctuary (CWS) from 2005 to 2015. ArcGIS and ERDAS imagine were used for land use 
change assessment. To derive supervised land use categorization, the maximum likelihood classification 
technique was applied. It was discovered that around 256 ha of the degraded forest area has increased over 
ten years (2005-2015), with an annual rate of change of 25.56%. Non-parametric classifiers do not accept a 
particular information appropriation to isolate a multi-dimensional feature space into classes. Most normally 
utilized non-parametric classifiers incorporate decision trees [4], support vector machines (SVM) [12] and 
expert systems. 

ML algorithms have been utilized according to pixel classifiers in remote sensing image 
analysis [6]. Grippa et al. [13] describes a method for mapping urban land use at the street block level, 
emphasizing residential usage by utilizing very-high-resolution satellite images and derived land-cover maps 
as input. For the classification of street blocks, a random forest (RF) classifier is utilized, which achieves 
accuracies of 84% and 79% for five and six land-use classifications, respectively. RF classifier applied over 
urban communities Dakar and Ouagadougou, cover more than 1,000 km? altogether, with a spatial resolution 
of 0.5 m. In the year 2019, Jamali [7] compared and contrasted eight machine learning methods for image 
categorization in the northern region of Iran developed in the Waikato environment for knowledge analysis 
(WEKA) and R programming languages. Machine learning models [14]-[16] such as RF, SVM [17], [18], 
decision tree, K-nearest-neighbors (KNN) [19], principal component analysis (PCA) [20] are successfully 
applied in many application areas. We have built up an ensemble model [21], including SVM and XGBoost 
[22], that gives better precision when contrasted with other individual machine learning models. 


3. LULC CLASSIFICATION 
3.1. Study area 

Our study site is the twin cities of Odisha i.e., Bhubaneswar and Cuttack, which are situated towards 
the Eastern part and lies between 20° 15' N-20° 28' N latitude and 85° 52' E-85° 54' E longitude. According 
to the European petroleum survey group (EPSG), the twin cities of Odisha lies in EPSG:32645-WGS 
84/UTM zone 45 N. It is surrounded by Ganjam district towards the north, Puri, Jagatsinghpur and 
Kendrapara districts towards the east, Jajpur and Dhenkanal districts towards the south and Anugul, Buodh 
and Nayagarh districts to the west. Bhubaneswar is the capital of Odisha, coming under the Khurda district. 
The urban administrative area of the twin cities is considered for analysis. 


3.2. Data acquisition 

To establish land usage, land cover (LU/LC) of the study area, Landsat satellite-8 ETM-+data for 
2020 have been used. The spectrum consists of six electromagnetic (EM), shortwave Infra-Red1-SWER 1 
and shortwave Infra-Red2-SWIR 2 including blue, green, red, near infra-red, which is used to classify into 
seven land use classes such as a river, canal, pond, forest, urban, agricultural land, sand. 
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3.3. Image pre-processing 
3.3.1. Layer stacking 

After data acquisition, this stage is to apply fundamental pre-processing activities framed on the raw 
Satellite images before its utilization in any further upgrade, understanding, interpretation or analysis. Layer 
stacking [23] is applied to consolidate various images into a single image. The LULC map after layer 
stacking is portrayed in Figure 1 (a). 


3.3.2. Atmospheric correction 

The atmospheric correction [1] is essential when working on images with more than one timestamp. 
We not only implement image classification but also want to compare several images between one another. 
The main aim is the conversion of raster bands of Landsat 8 images from digital numbers to reflectance. 
Atmospheric correction is applied on the resultant image of layer stacking shown in Figure 1(a). In our work, 
the atmospheric correction is implemented using the dark object subtraction method. After implementing 
atmospheric correction, the generated map is shown in Figure 1(b). 


(a) (b) 


Figure 1. Image pre-processing (a) after layer stacking and (b) after atmospheric correction 


3.3.3. Image composite 

Each band of a multispectral image can be shown each band as a grayscale image or as a mix of 
three bands simultaneously as a color composite image. The three essential shades of light are red, green, and 
blue (RGB). PC screens can show a picture of three unique groups by utilizing an alternate essential color for 
each band. When we consolidate these three images, the outcome is a color image with every pixel's shading 
controlled by a mix of RGB of various splendors. Our study area is cropped from the LULC map shown in 
Figure 1. Two different color composite formats: True color composite and false color composite [24] of the 
study area is depicted in Figures 2(a) and 2(b), respectively. 


(a) (b) 


Figure 2. Image composite of twin cities of Odisha in (a) false color and (b) true color 


4. METHOD 

The proposed classification method consists of three major stages. Firstly, the study area is 
identified, and data acquisition is performed. In the next stage, image pre-processing is carried out with layer 
stacking and atmospheric correction. Finally, ensemble classification is carried out for thematic LULC 
change analysis. 
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Image classification is an automated approach for classifying raster data belongs to satellite 
images [25], [26], airborne images, and drone images. This typically includes evaluating several images and 
applying statistical rules in determining the identity of the land cover for each pixel in an image. In this 
paper, supervised classification algorithms are applied for LULC classification. All classes of interest are 
selected to prepare the train and test dataset. The proposed model is depicted in Figure 3. The training data is 
used to train the classifier, whereas the test data is used to validate the model. Raster library of R is used to 
generate the features from the raster and shapefile of the train and test data. The generated features are taken 
as input to implement the classifier using Python's sklearn library with default parameter setting. For the 
ensemble model, voting classifier is used with 0.5 weight for both SVM and XGBoost models. 

After defining the classes, the next step is training stage. Here numerous training areas for the 
required land cover classes are identified. A sufficient sample size is required to ensure accurate statistical 
descriptors of our training data. The LULC maps generated in different machine learning models are shown 
in Figure 4. Output maps of minimum distance, RF and Hybrid model (SVM+XGBoost) are shown in 
Figures 4(a), 4(b), and 4(c), respectively. 


Landsat 8 data acquisition from USGS 


Image Pre-processing in QGIS 


Layer Stacking 
Geometric, Radiometric and 
Atmospheric correction 


' Train data shape file 
À creation 


Feature extraction using 
raster library 


Ensemble Classifier 


SVM XGBoost 


Figure 3. Proposed model 
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Figure 4. Classification using (a) minimum distance, (b) random forest (RF), and (c) SVM+XGBoost 
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5. RESULTS AND DISCUSSION 

Then the accuracy assessment is done to determine that how good the map is. If the accuracy 
assessment shows that the land use land cover map is valid then the resulting map can be utilized in different 
ways like dramatic maps, all kind of output tables or statistics for the various land cover classes and digital 
data files amendable to inclusion in geographical information system (GIS). In supervised learning 
classification, an error occurs when a pixel that belongs to one class is allotted to another class. Here the 
question is how to test for it. And basically, there are two methods either visual control or quantitative control 
to test this. 

Visual control is basically visual assessment of the results of supervised or unsupervised learning. 
Once the visual control has passed and if the results look plausible then the quantitative approach for 
accuracy assessment can be done. Accuracy assessment procedure with the help of error metrics plays vital 
role in any classification job and plays a vital role in LULC classification. This is done by calculating 
different accuracy measures like overall accuracy (OA), user accuracy (UA), producer accuracy (PA), Kappa 
coefficient from the confusion matrix. Table 1 shows a comparison between different Machine learning 
models based on train accuracy, overall accuracy and Kappa index [27], [28]. Higher value indicates better 
classification. Table 2 depicts PA and UA of ML classifiers for classifying seven class labels. 


Table 1. Model accuracy and Kappa coefficient of ML classifiers 


Model Train accuracy Test accuracy (OA) Kappa coefficient 
Minimum distance 0.8592 51.77 0.413 
KNN 0.9952 93.0782 0.8738 
LR 0.9404 93.0049 0.8682 
DT 1.0000 92.4922 0.8631 
SVM 0.9514 93.3208 0.8742 
XGBoost 0.9957 93.5360 0.8818 
Extra tree 0.9980 93.2064 0.8760 
RF 0.8776 88.6330 0.7872 
SVM+XGBoost 0.9920 93.5635 0.8824 
RF+XGBoost 0.9907 93.4902 0.8806 


Table 2. Producer and UA of ML classifiers 
Model River Canal Pond Forest Urban Agri. Land Sand 
PA UA PA UA PA UA PA UA PA UA PA UA PA UA 
KNN 99.60 99.98 87.28 90.49 69.23 49.32 99.93 73.54 77.33 98.00 13.07 38.89 100 99.02 
LR 96.68 100 NaN 0.00 NaN 0.00 97.90 85.03 64.34 99.13 12.57 5.83 100 99.51 
DT 99.41 99.95 82.02 79.75 53.89 44.29 99.51 72.87 81.20 96.73 11.91 40.00 99.92 96.17 
SVM 96.68 100 0.00 0.00 0.00 0.00 98.10 85.46 69.86 98.60 23.65 21.94 100 99.59 
XGBoost 99.38 99.97 76.20 82.52 62.87 47.95 100 76.60 84.14 98.00 13.13 37.78 99.92 99.51 
Extra tree 99.50 99.98 84.41 88.04 79.69 46.58 99.90 74.38 80.33 98.00 12.83 39.72 100 99.35 
RF 98.08 99.73 NaN 0.00 0.00 0.00 97.21 92.53 42.28 100 NaN 0.00 NaN 0.00 
SVM+ 99.60 99.96 82.16 86.20 62.89 45.66 100 77.16 80.31 98.73 10.54 28.33 99.92 99.51 
XGBoost 
RF+ 99.13 99.97 89.05 74.85 66.67 43.84 99.82 78.54 80.06 99.07 6.95 17.50 99.92 99.51 
XGBoost 


6. CONCLUSION 

Land use and land cover classification is beneficial to explore the change dynamics of the city. 
Although the maximum likelihood classifier is widely used, it could not perform satisfactorily to ensure the 
desired classification accuracy. This work presented the pixel-based classification of LULC using various 
ML models. This will benefit the researcher to recognize the best classifiers and various evaluation metrics. 
Landsat 8 geospatial data with atmospheric correction significantly improve the accuracy of LULC 
classification. An ensemble model is proposed by combining the output of SVM and Extreme gradient 
boosting model. The efficacy of the proposed model is shown in Table 2. It is seen that the user, producer, 
and overall accuracy has been significantly improved in ensemble models. 
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